[AIROCMLIR-43] tuningRunner improvements - Add state file for crash recovery and improve logging#2208
[AIROCMLIR-43] tuningRunner improvements - Add state file for crash recovery and improve logging#2208mirza-halilcevic wants to merge 34 commits intodevelopfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR enhances the tuning infrastructure with state file management for crash recovery and comprehensive logging improvements. The changes enable the tuner to track configuration states (running, failed, crashed, interrupted), persist them across runs, and recover gracefully from interruptions or crashes.
Changes:
- Added JSON state file mechanism to track tuning progress and enable crash recovery
- Introduced structured logging with color-coded output and tqdm integration
- Enhanced error reporting with detailed context and formatted output
- Added
--retry-failedflag to selectively retry failed/crashed configs - Improved progress tracking with ETA estimation based on median completion times
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| mlir/utils/performance/tuningRunner.py | Core implementation of state management, logging infrastructure, and enhanced error handling |
| mlir/utils/performance/perfRunner.py | Simplified tuning database reader to handle variable column counts |
| mlir/utils/jenkins/Jenkinsfile.downstream | Removed --quiet flag from CI tuning commands |
| mlir/utils/jenkins/Jenkinsfile | Removed --quiet flag from fusion tuning commands |
| mlir/lib/Dialect/Rock/Tuning/RockTuningImpl.cpp | Removed obsolete comment about hidden warning |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Motivation
Improve crash recovery, informational output, and error reporting.
Technical Details
Test Plan
This branch was used to create the tuning databases from which the quick-tune lists in #2212 were generated.
Test Result
Submission Checklist